Querying data set tables in a non-transactional database

ABSTRACT

A method and apparatus for facilitating data set query is disclosed. In the method and apparatus one or more tables may be created for the data set, whereby each table of the one or more tables may enable searching the data set using one or more records that are associated with one or more indices of the data set. Upon receiving a request to search the data set, a table of the one or more table is identified based at least in part on the one or more bases for query and is searched to provide a yielded record.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.14/147,282, filed on Jan. 3, 2014, entitled “QUERYING DATA SET TABLES INANON-TRANSACTIONAL DATABASE,” the content of which is incorporated byreference herein in its entirety.

BACKGROUND

Database systems have popular applications in many fields including theInternet and electronic commerce spaces. In many applications, databasesystems are used to maintain important information about products,customers and the like. Further, with the growth of Internet ande-commerce applications, among others, it is becoming increasinglynecessary for database systems to be able to handle ever-growing datasets. It is also becoming increasingly important for database systems tobe capable of providing features such as the ability to performexpedient data queries and the ability to provide a large datathroughput. However, as the size of data sets maintained by the databasesystems increases, certain database systems may not be optimally scaledand may incur performance penalties. Further, as the size of data setsincreases, the throughput achieved by some database system decreases andthe computational cost of performing data queries, data updates andother operation increases.

Accordingly, it is challenging to provide a robust query capability on alarge set of data. It is also challenging to enable updating records ofa set of data while ensuring that conflicts between the records areavoided.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments in accordance with the present disclosure will bedescribed with reference to the drawings, in which:

FIG. 1 shows an example of enabling data set queries using a NoSQLdatabase in accordance with at least one embodiment;

FIG. 2 shows an example of environment for managing a data set in adatabase service in accordance with at least one embodiment;

FIG. 3 shows an example of a schema for a table in accordance with atleast one embodiment;

FIG. 4 shows an example of non-relational database tables in accordancewith at least one embodiment;

FIG. 5 shows an example of a system for transacting with a NoSQLdatabase in accordance with at least one embodiment;

FIG. 6 shows an example of a method for persisting a NoSQL databasetable in accordance with at least one embodiment;

FIG. 7 shows an example of a method for creating a NoSQL database tablein accordance with at least one embodiment;

FIG. 8 shows an example of a method for distinguishing records inaccordance with at least one embodiment; and

FIG. 9 illustrates an environment in which various embodiments can beimplemented.

DETAILED DESCRIPTION

In the following description, various embodiments will be described. Forpurposes of explanation, specific configurations and details are setforth in order to provide a thorough understanding of the embodiments.However, it will also be apparent to one skilled in the art that theembodiments may be practiced without the specific details. Furthermore,well-known features may be omitted or simplified in order not to obscurethe embodiment being described.

Techniques described and suggested herein include facilitating thesearching of a data set based at least in part on one or more indices ofthe data set. A data set may include a plurality of records, whereby theplurality of records may be organized according to a plurality ofindices and each record of the plurality of records may belong to anindex of the plurality of indices. As described herein, an index may bea column of the data set or a data field of the data set. The data setmay have a pre-specified schema whereby each row of the data set mayhave a corresponding value that populates each index of the plurality ofindices. However, in alternative embodiments, the data set may beschema-less, whereby it may not be required for each index of the dataset to be populated for each row. For example, in a schema-less dataset, one row may have corresponding records in a first set of indices,whereas another row may have corresponding records in a second set ofindices. The first set of indices and the second set of indices may ormay not overlap.

The data set may be sought to be queried based at least in part on aplurality of bases for query. The data set (also referred to herein as aprimary data set or a primary table) may be stored in a data base. Abasis for query may be an index of the data set utilized to query one ormore other indices of the data set. When a data set is queried using onebasis for query, a data set record or a range of data set records (forexample, values expected to be found in one data set column) areprovided to form the basis for query and rows having a matching recordare provided as a query result.

When the data set is retained in a table of a transactional database,the transactional database may offer elaborate query capabilities. Forexample, the transactional database may enable querying the data set byproviding three data records or ranges of records, whereby each recordmay belong to a different index of the data set. As a result of thequery, all data set rows whose corresponding indices have records thatmatch the queried records are provided. Although transactional databasesprovide greater flexibility with respect to query options, thetransactional databases are computationally expensive and may not beoptimally scaled as the size of the data set increases. As the size ofthe data set increases, the throughput provided by the transactionaldatabases decreases. Conversely, a non-transactional database, such as aNoSQL database, may offer greater scalability as the size of the dataset increases at the expense of more complex query functionality.

A non-transactional database table may enable queries to be performedbased at least in part on two bases for query. Each non-transactionaldatabase table may have a primary key comprising a hash key and a rangekey and any two indices of a data set may be designated as the hash keyand the range key. The remaining indices of the table may be designatedas a secondary index. The table may be queried by providing a value or arange of values for the hash key and a value or a range of values forthe range key. The result of the query may be the remaining indicesdesignated as a secondary index.

To enable a primary data set to become searchable based at least in parton more than two bases for query, two or more tables may be persistedfor the primary data set in a non-transactional database. A persistedtable (also referred to herein as a secondary table) may be persisted bystoring entries of the primary data set in a non-transactional database,whereby the entries may be populate rows of the table. Accordingly, thetable may be maintained in the database, and searches may be performedon the table. Each persisted table may be searched based at least inpart on two bases for query. Accordingly, by constructing more than onetable for the primary data set added query flexibility may be attained.Further, the two bases for query for the tables may be different or mayhave a basis for query in common. It is noted that when a succession ofqueries is performed on the various tables of the primary data set, amulti-indexed query may be achieved as provided, for example, by atransactional database.

In some embodiments, a contract between a buyer and a seller may berecorded in primary data set, whereby to record the contract, theprimary data set may include an index for a buyer ID or a seller ID, anindex for a contract ID, an index for a creation time associated withthe contract, an index for a time on which the contract is updated, anindex for a status of the contract and an index for the type of thecontract. To cause the primary data set to be searchable for a contractidentity based at least in part on the buyer ID and the creation time, atable of the data set may be persisted in the non-transactional databaseand the index for the buyer ID may be designated as a hash key and theindex for the creation time may be designated as the range key. Further,to cause the data set to be searchable for a contract identity based atleast in part on the buyer ID and the update time, a second table of thedata set may be persisted in the non-transactional database and theindex for the buyer ID may be designated as a hash key and the index forthe update time may be designated as the range key. Similarly,additional tables may be persisted to facilitate other types of queries.

Upon receiving a request to query the data set, the request may beprocessed to determine the one or more bases for query associated withthe request. For example, if the request for query specifies a buyer IDand a creation time for a contract and requires the contract identity asa query outcome, the first created table may be queried due to the factthat the first created table is capable of facilitating searching thedata set on the basis of buyer ID and creation time.

FIG. 1 shows an example of enabling data set queries using a NoSQLdatabase in accordance with at least one embodiment. A data set 104,which may be populated by a plurality of records belonging to aplurality of indices, is stored in a NoSQL database 102. Further, twotables 106, 108 for the primary data set 104 are persisted in tables inthe NoSQL database 102. The two tables 106, 108 are persisted in theNoSQL database 102 in order to facilitate queries based at least in parton one or more bases for query. As persisted in the first table 106, thedata set may be searched using two bases for query, whereby each basisfor query corresponds to an index of the data set 104. Similarly, aspersisted in the second table 108, the data set 104 may be queried usingtwo bases for query. In the illustrative example of FIG. 1, the basesfor query of the first table 106 are the second and third indices of thefour indices of the data set 104, whereas the bases for query of thesecond table 108 are the second and fourth indices. By creating twotables for the data set 104 in the database service 102, the data set104 may be flexibly queried based on a variety on indices. For example,utilizing the first table 106 records belonging to the first index andfourth index may be queried based at least in part on records belongingto the second and third indices. Further, utilizing the second table 108records belonging to the first index and third index may be queriedbased at least in part on records belonging to the second and fourthindices.

FIG. 2 shows an example of environment for managing a data set in adatabase service in accordance with at least one embodiment. In theenvironment 200, a database management entity 204 communicates with adatabase service 202. The database management entity 204 may comprise acollection of computing resources collectively configured to operate andmanage a table maintained by the database service 202. For example, thedatabase management entity 204 may be at least one computing device thatis configured to transmit commands to the database service 202 tooperate and manage a table maintained by the database service 202. Thetable may include a plurality of records of a data set and the databasemanagement entity 204 may cause the table to be stored in the databaseservice 202. The database management entity 204 may further specify thebases for query and secondary indices associated with the table. Thedatabase management entity 204 may also initiate performing queries onthe table, for example, by providing one or more records associated withone or more bases of query for the table and enable providing queryresults to a requesting party. In addition, the database managemententity 204 may enable modifying or updating one or more recordsmaintained in the table.

The database service 202 may comprise a collection of computingresources collectively configured to persist one or more tables andcause the one or more tables to be searchable. For example, the databaseservice 202 may be at least one computing device that is configured toreceive commands from the database management entity 204 and operate tomaintain the one or more tables in accordance with the receivedcommands. The database service 202 may receive appropriately configuredAPI calls specifying actions to be performed on the one or more tables,such as persisting a database table and modifying the table as desired.The tables may have any number of attributes and may or may not have afixed schema. In addition, the database tables may be non-relational orNoSQL tables. Further, the database service 202 may support records ofany data type including, but not limited to, strings, numbers, binarydata and sets.

In the environment illustrated in FIG. 2, a queue service 206 isincluded. The queue service 206 may comprise a collection of computingresources collectively configured to provide one or more queues fortransmitting notifications, tasks, events, messages or data. Forexample, the queue service 206 may be at least one computing device thatis configured to retain notifications, tasks, events, messages or datafor a pre-specified amount of time. The queue service 206 may receive anotification from a notification service of a task that is sought to beperformed on a table maintained by the database service 202. The queueservice 206 may place the notification in a queue. As described herein,the tasks may include modifying an entry of a primary data set that ispersisted in the database service 202 or one or more secondary tablesthat are persisted in the database service 202. As described herein,after the notification is added to the queue, the notification may bereceived by the database management entity 204, which may in turn, causethe execution of the task. After execution, the notification may beremoved from the queue. The queue service 206 may enable data locking,whereby after a notification is pulled from the queue, the notificationmay be deemed as locked and may become inaccessible to prevent duplicateexecution of the tasks.

In the environment illustrated in FIG. 2, a notification service 208 isincluded. The notification service 208 may comprise a collection ofcomputing resources collectively configured to provide to configuretopics for which the database service 202 seeks to be notified. Forexample, the notification service 208 may be at least one computingdevice that is configured to cause the database management entity 204 tobe notified when a task for execution is received. The notificationservice 208 may cause the delivery of the messages any protocol (forexample, hypertext transfer protocol (HTTP), e-mail and short messageservice (SMS), among others). The notification service 208 may providenotifications using a “push” mechanism without the need to periodicallycheck or “poll” for new information and updates.

For example, a primary data set that is stored in the database service202 may have one or more associated secondary tables that are alsostored in the database service 202 to facilitate performing a query onthe data set. If a task, such as modifying or adding an entry to thedata set is sought to be performed, a notification is issued to thedatabase management entity 204. The notification may be queued in aqueue of the queue service 206, whereby the database management entity204 may retrieve the notification. Further, in accordance with thenotification, the primary data set may update the primary data set.After the primary data set is updated, a second notification may beissued, for example, by the notification service 208 to update the oneor more associated secondary tables in accordance with the updateperform on the primary data set. The second notification may also bequeued (for example, in the same queue as the prior notification) andmay be retrieved by the Using a similar the one or more form the queueand cause the primary data set to may be retrieved by the databasemanagement entity 204 from the queue. After retrieval, the databasemanagement entity 204 causes the one or more secondary tables maintainedby the database service 202 to be updated.

FIG. 3 shows an example of a schema for a table in accordance with atleast one embodiment. The table 300 has a plurality of indices of whichfour indices are shown and a plurality of filtering parameters of whichtwo filtering parameters are shown. The four indices are denoted asindex 1-4 and referred to herein by the numerals 301-304, respectively.Further, the two filtering parameters are denoted as filtering parameter1 and filtering parameter 2 and are referenced by the numerals 305, 306,respectively. Each index of the plurality of indices 301-304 may beutilized to capture one or more characteristics associated with an entryof the table 300. The characteristic associated with an entry may be anytype of alphanumeric string. In the example of FIG. 3, the first index301 is used to specify an identity associated with a buyer or a seller,the second index 302 is used to specify an identity associated with acontract or a transaction, the third index 303 is used to specify acreation time associated with the contract or transaction and the fourthindex 304 is used to specify an update time associated with the contractor transaction. The plurality of filtering parameters 305, 306 of thetable 300 further include a first filtering parameter 305 that specifiesa status associated with the contract or transaction. The status may forexample be “open” or “closed” and may signify whether the contract ortransaction remains open or has been satisfied, respectively. In someembodiments, the status may be represented by one or more bits, wherebyeach value of the one or more bits may signify a particular contract ortransaction status. A second filtering parameter 306 is also included.The second filtering parameter 306 specifies a type of contract ortransaction. For example, the type may indicate that the transaction orcontract are authorized, refunded or captured. The plurality offiltering parameters 305, 306 may be used to filter the results of anyquery performed on the entries of the table 300.

It is noted that in various embodiments, the number of indices andfiltering parameters associated with the table 300 may vary, whereby thenumber of indices and filtering parameters described with reference toFIG. 3 is shown to facilitate description. It is also noted that invarious embodiments, each index or filtering parameter may be utilizedto represent different characteristics of data entries of the table 300.

Queries may be performed on the entries of the table 300, whereby givena specific criterion for the one or more indices matching table entriesmay be obtained. Further, the type of database utilized to retain thetable 300 may influence the type of queries or the complexity of thequeries that are performed on the entries of the table 300. For example,if the table 300 is retained by a relational database (such as astructured query language (SQL) database), a great deal of facility maybe provided in performing queries on table entries. However, arelational database may be associated with higher operational cost thana non-relational database, such as a NoSQL database. Further, arelational database (such as a structured query language (SQL) database)may not be scaled as efficiently as a non-relational database, wherebyas the size of data increases a larger operational burden may be placedon a relational database as compared with a non-relational database.

In some embodiments, it may be desirable for the queries to be based atleast in part on parameters associated with two indices. For example, itmay be desirable to query a contract or transaction identity based atleast in part on one or more buyer identities and one or more creationtimes (or a range of creation times). Further, it may be desirable toquery a contract or transaction identity based at least in part on oneor more buyer identities and one or more update times (or a range ofupdate times) as indexed by the fourth index 304. Whereas a SQL databasemay facilitate the queries, a non-relational NoSQL database may notfacilitate the queries due at least in part to the fact that the queriesare predicated upon controlling for the parameters associated with threeindices of the table 300.

FIG. 4 shows an example of non-relational database tables in accordancewith at least one embodiment. The non-relational database tables 402-408may be used to facilitate performing queries on one or more items in thetable 300 described with reference to FIG. 3. A non-relational databasetable may be a collection of items, whereby each item may be acollection of attributes. Further, as compared to a relational database,where a table has a predefined schema including, among others, a primarykey, a plurality of columns and a plurality of data types associatedwith each column, a table in a non-relational database may not requirethat all records stored in the table have the same set of columns.Further, individual items stored in a table of a non-relational databasemay have any number of attributes. An attribute may be any name-valuepair, whereby the value may be a single value or a multi-valued set.

The non-relational database tables 402-408 are each associated with aprimary key made of two attributes. The first attribute is referred toherein as a hash attribute and the second attribute is referred toherein as a range attribute. Items in the non-relational database tables402-408 may be queried based at least in part on the two attributes ofthe primary key. For example, table 402 whose hash attribute is thebuyer ID and range attribute is the creation date may be searched basedat least in part on a buyer ID and a value or a range for the creationdate. The search may yield one or more matching contract identities, theupdate times associated with the contract identities, a statusassociated with the contract and a contract type.

Similarly, table 408 whose hash attribute is the buyer ID and rangeattribute is the update time may be searched based at least in part on aseller ID and a value or a range for the update time. The search mayyield one or more matching contract identities, the creation timesassociated with the contract identities, a status associated with thecontract and a contract type.

Accordingly, to render the table 300 described with reference to FIG. 3searchable on the basis of various indices, one or more non-relationaldatabase tables may be created. The one or more non-relational databasetables are advantageous in that they are more scalable than a SQLdatabase table. Further, the non-relational database tables areassociated with a lower operational cost than a SQL table. As describedherein, the first non-relational database table 402 is structured torender the data searchable on the basis of buyer ID and creation timeand the second non-relational database table 404 is structured to renderthe data searchable on the basis of seller ID and creation time.Further, the third non-relational database table 406 is structured sothat contract identities are searchable on the basis of seller ID andupdate time and the fourth non-relational database table 406 isstructured so that transaction identities are searchable on the basis ofseller ID and update time. It is noted that the number of creatednon-relational database tables may be increased or decreased dependingon the searches that are desired to be made available. For example, thefourth non-relational database table 408 may only be structured ifsearching contracts on the basis of a buyer ID and update time wassought.

In addition to permitting queries on the basis of a primary keycomprised of two attributes, the results of a search conducted on anon-relational database table may be filtered on the basis of any filterparameter, such as status or type.

FIG. 5 shows an example of a system for transacting with a NoSQLdatabase in accordance with at least one embodiment. The system 500includes a non-transactional database 502 that may be used to maintainsearchable tables based at least in part on a hash attribute and a rangeattribute. As described herein, the searchable tables may be created inaccordance with the search requirements of the system 500. The system500 also includes a database management entity 504, which may beexecuted as a process running on a computer system, that queries thenon-transactional database 502 and provides query results to arequesting party. Further, the database management entity 504 may enablecreating one or more database table and updating the data in any one ofthe database tables by, for example, adding one or more attributes tothe tables. For example, the database management entity 504 may enablecreating or updating indices for one or more tables of thenon-transactional database 502. As described herein, the databasemanagement entity 504 further facilitates handling conflicts of recordshaving the same timestamp.

The system 500 includes an event publisher 510. The event publisher 510,which may be implemented as a process running on a computer system,enables publishing indexing events for contracts or transactions when astate update occurs. The state update may be due to a change in anattribute associated with a contract or a transaction that is part of adatabase table. Further, if a new contract or transaction is added for abuyer or a seller, the event publisher 510 enables the attributesassociated with the contract or transaction (for example, contract ortransaction identity, creation time, update time, type or status) to bepublished in the database table. As described herein, an event may beany addition, deletion or modification to any attribute associated witha contract or transaction. When publishing an event, the event publisher510 may enqueue the event in a queue 508 provided by a queue service.Further, a notification service may be used to notify the queue of apublished event, whereby the queue may be subscribed to the notificationservice and the notification service may notify the queue of events thatare added to the queue. Further, error handling may also be employed forresolving exceptions. In some embodiment, events may be enqueued in morethan one queue. For example, a first queue may be used for state updatespertaining to contracts and a second queue may be used for state updatespertaining to transactions.

The indexing entity 506, which may be executed as a process running on acomputer system, may include event handling logic that receives eventsfrom the queue 508 and communicates with the database management entity504 to cause one or more indices corresponding to the event to becreated or updated. In some embodiments, the indexing entity 506 mayinclude two or more event handler (for example, a first event handlermay be associated with a first queue used for state updates pertainingto contracts and a second event handler may be associated with a secondqueue used for state updates pertaining to transactions). The indexingentity 506 may also include logic dedicated to error handling asdescribed herein.

The request processing entity 514 and the query entity 512 areconfigured to enable submitting queries directed to tables of thenon-transactional database 502 and providing one or more results of thequeries to a requesting party. The request processing entity 514 mayreceive requests for query searches from a user or a service. A requestmay be submitted to the request processing entity 514 using anappropriately configured application programming interface (API)function call. Further, a user interface (UI) may be provided to enablesa user (for example, a buyer or a seller) to submit one or more requeststo the request processing entity 514. For example, a seller may beprovided with a seller UI using which the seller may submit queries tosearch for contracts or transactions associated with the seller based atleast in part on the identity of the seller.

The query entity 512 may be configured to evaluate a received queryrequest and communicate the request to the database management entity504 in order to retrieve a search result associated with the query fromthe non-transactional database 502. The query entity 512 may also enablepaginating query results, whereby one page of query results may beretrieved at one time. The page size for a query may be specified in thequery request or may be otherwise specified. Further, pagination mayenable limiting the number of table names included in one page ofresults. When search results are paginated, tokens may be employed toensure that a user may receive subsequent pages of search results. Forexample, after a page of search results is retrieved, a token may beprovided to specify the starting of a subsequent page of search results.

FIG. 6 shows an example of a method for persisting a NoSQL databasetable in accordance with at least one embodiment. In the process 600, adatabase service stores 602 a data set, which may be a primary data set.The data set may by any type of user data, such as the table of userdata described with reference to numeral 300 in FIG. 3. The data set mayhave a plurality of columns (also referred to herein as indices) andeach column may be associated with one or more data items. For example,the data set may include one or more columns for buyer or seller ID,contract or transaction ID, creation time, update time, status or type.Further, the data set may be populated with information for each column.The data set may have been originally transferred to the databaseservice from another entity (for example, to render the data setsearchable). Further, the data set may have been built over time in thedatabase service as items are added to the data set. In addition, thedata set may be the result of both a data transfer and the addition ofitems over time.

A database management entity receives 604 a request for persistingindices for query. The request for persisting indices for query mayspecify one or more query objects and one or more bases for query. Theone or more query objects may indicate data set entries or columns thata user seeks to query. For example, the user may seek to query contractor transaction identities and, may, accordingly specify that thecontract or transaction identities column is the query object sought forthe data set. The object for query may be part or the query resultprovided to a user. The one or more bases for query may specify one ormore data set indices that form one or more query criteria. Further, theone or more bases for query may be query constraints provided by theuser in order to be obtain a query object. For example, the basis forquery may be specified as a buyer or seller ID and update time, wherebythe user may seek to query a contract or transaction identity based atleast in part on the provided buyer or seller ID and update time.

The database management entity persists 606 one or more NoSQL databasetables in accordance with the query object and bases for query. Asdescribed herein, the one or more NoSQL database tables may beconstructed such that the primary key may be made of one or more basesfor query. Further, a secondary index may be created for the queryobject. After the one or more NoSQL database tables, the databasemanagement entity receives 608 a request to query a persisted table. Thedatabase management entity then queries 610 the persisted table based atleast in part on the received request.

FIG. 7 shows an example of a method for creating a NoSQL database tablein accordance with at least one embodiment. In the process 700, thedatabase management entity persists 702 a table having as primary keythe one or more bases for query. For example, if the user seeks to useboth the buyer ID and creation time as the bases for queries, theprimary key for the persisted table may have the buyer ID as a hashattribute and the creation time as a hash range. The table may bepopulated in accordance with the data retrieved from the data set.Further, if more than one bases for query are sought, the index adaptermay construct more than one table as described herein. For example, asdescribed with reference to FIG. 4, four tables may be created tofacilitate various queries.

The database management entity then causes 704 the query object to beassociated as a secondary index for the table. As a secondary index, thequery object may be provided as a result of searches performed on theprimary key. The database management entity also adds 706 supportingfiltering parameters to the constructed table. For example, contract ortransaction type and contract or transaction status may be added to thetable as filtering parameters and be used to filter search results ofsecondary index data.

Following the creation of one or more tables in accordance with at leastone embodiment described herein, the index adapter may cause the tablesto be updated. The index adapter may receive data that is sought to beadded to the data set, such as the creation of an entry for a newcontract, and may add the data to the tables. When more than one tableare maintained, updating the tables may be coordinated.

Further, mechanisms for handling conflicts in the data may be employedto guarantee data integrity. Conflict handling may be based at least inpart on distinguishing matching data entries, whereby, for example,matching data entries may be entries that have identical values orrepresentations. By way of example, two contracts that are created atthe same time may have identical creation time entries. As describedherein, matching data entries may be distinguished by appending eachdata entry with a value or a representation that is different from theremaining data entries.

In some embodiments, a creation time or update time for a contract ortransaction for a buyer or a seller may be recorded in units ofmilliseconds (ms) with respect to a time reference. For example, acontract or a transaction that is created one day after the timereference may have a creation time of 86400000. To avoid having two ormore contracts or transactions with the same creation time two digitsmay be appended to the creation time to distinguish the records. Thedata may be examined to determine the number of matching data entriesand the data entries may be appended in order to distinguish the data.

FIG. 8 shows an example of a method for distinguishing records inaccordance with at least one embodiment. In the process 800, a databasemanagement entity receives 802 a record for inclusion in a databasetable (the record may belong to a primary index of the table). Therecord may potentially be in conflict with an existing record of thedatabase, whereby the database management entity may be configured toappend the record (for example, using one or more digits) to distinguishthe record from the existing record. The database management entityqueries 804 the database to determine the number of matching records inthe database. The query may be designed to yield records that areappended to be distinguished from matching records. For example, if thedatabase management entity resolves record conflicts by appending twodigits to a record, the database may be searched for all matchingrecords that are appended with values between 00 and 99 in order toyield all matching records.

The database management entity then appends 806 the received record inaccordance with the number of matching records. For example, if threematching records are found and the index adapter appends the recordswith two digit numbers that are assigned consecutively, the indexadapter may append the received record with the digits 03 to distinguishthe record from the existing records that are appended with the digits00, 01 and 02. On the other hand, if no matching records are found, thedatabase management entity may append the record with the digits 00. Thedatabase management entity then incorporates 808 the appended record inthe database.

In some embodiments, the search query performed to determine the numberof matching records may be narrowed so as to limit the number ofdistinguished records. For example, it may be desirable for alltransactions associated with a certain sellers to be distinguished basedat least in part on the update time associated with the transactions.Further, it may be acceptable for two transactions that are associatedwith different sellers to have matching update times. Accordingly, thedatabase management entity may only query matching update times thatpertain to the seller and may exclude update times of other sellers.

FIG. 9 illustrates aspects of an example environment 900 forimplementing aspects in accordance with various embodiments. As will beappreciated, although a web-based environment is used for purposes ofexplanation, different environments may be used, as appropriate, toimplement various embodiments. The environment includes an electronicclient device 902, which can include any appropriate device operable tosend and/or receive requests, messages or information over anappropriate network 904 and, in some embodiments, convey informationback to a user of the device. Examples of such client devices includepersonal computers, cell phones, handheld messaging devices, laptopcomputers, tablet computers, set-top boxes, personal data assistants,embedded computer systems, electronic book readers and the like. Thenetwork can include any appropriate network, including an intranet, theInternet, a cellular network, a local area network, a satellite networkor any other such network and/or combination thereof. Components usedfor such a system can depend at least in part upon the type of networkand/or environment selected. Protocols and components for communicatingvia such a network are well known and will not be discussed herein indetail. Communication over the network can be enabled by wired orwireless connections and combinations thereof. In this example, thenetwork includes the Internet, as the environment includes a web server906 for receiving requests and serving content in response thereto,although for other networks an alternative device serving a similarpurpose could be used as would be apparent to one of ordinary skill inthe art.

The illustrative environment includes at least one application server908 and a data store 910. It should be understood that there can beseveral application servers, layers or other elements, processes orcomponents, which may be chained or otherwise configured, which caninteract to perform tasks such as obtaining data from an appropriatedata store. Servers, as used herein, may be implemented in various ways,such as hardware devices or virtual computer systems. In some contexts,servers may refer to a programming module being executed on a computersystem. As used herein, unless otherwise stated or clear from context,the term “data store” refers to any device or combination of devicescapable of storing, accessing and retrieving data, which may include anycombination and number of data servers, databases, data storage devicesand data storage media, in any standard, distributed, virtual orclustered environment.

The application server can include any appropriate hardware, softwareand firmware for integrating with the data store as needed to executeaspects of one or more applications for the client device, handling someor all of the data access and business logic for an application. Theapplication server may provide access control services in cooperationwith the data store and is able to generate content including, but notlimited to, text, graphics, audio, video and/or other content usable tobe provided to the user, which may be served to the user by the webserver in the form of HyperText Markup Language (“HTML”), ExtensibleMarkup Language (“XML”), JavaScript, Cascading Style Sheets (“CSS”) oranother appropriate client-side structured language. Content transferredto a client device may be processed by the client device to provide thecontent in one or more forms including, but not limited to, forms thatare perceptible to the user audibly, visually and/or through othersenses including touch, taste, and/or smell. The handling of allrequests and responses, as well as the delivery of content between theclient device 902 and the application server 908, can be handled by theweb server using PHP: Hypertext Preprocessor (“PHP”), Python, Ruby,Perl, Java, HTML, XML or another appropriate server-side structuredlanguage in this example. It should be understood that the web andapplication servers are not required and are merely example components,as structured code discussed herein can be executed on any appropriatedevice or host machine as discussed elsewhere herein. Further,operations described herein as being performed by a single device may,unless otherwise clear from context, be performed collectively bymultiple devices, which may form a distributed and/or virtual system.

The data store 910 can include several separate data tables, databases,data documents, dynamic data storage schemes and/or other data storagemechanisms and media for storing data relating to a particular aspect ofthe present disclosure. For example, the data store illustrated mayinclude mechanisms for storing production data 912 and user information916, which can be used to serve content for the production side. Thedata store also is shown to include a mechanism for storing log data914, which can be used for reporting, analysis or other such purposes.It should be understood that there can be many other aspects that mayneed to be stored in the data store, such as page image information andaccess rights information, which can be stored in any of the abovelisted mechanisms as appropriate or in additional mechanisms in the datastore 910. The data store 910 is operable, through logic associatedtherewith, to receive instructions from the application server 908 andobtain, update or otherwise process data in response thereto. Theapplication server 908 may provide static, dynamic or a combination ofstatic and dynamic data in response to the received instructions.Dynamic data, such as data used in web logs (blogs), shoppingapplications, news services and other such applications may be generatedby server-side structured languages as described herein or may beprovided by a content management system (“CMS”) operating on, or underthe control of, the application server. In one example, a user, througha device operated by the user, might submit a search request for acertain type of item. In this case, the data store might access the userinformation to verify the identity of the user and can access thecatalog detail information to obtain information about items of thattype. The information then can be returned to the user, such as in aresults listing on a web page that the user is able to view via abrowser on the user device 902. Information for a particular item ofinterest can be viewed in a dedicated page or window of the browser. Itshould be noted, however, that embodiments of the present disclosure arenot necessarily limited to the context of web pages, but may be moregenerally applicable to processing requests in general, where therequests are not necessarily requests for content.

Each server typically will include an operating system that providesexecutable program instructions for the general administration andoperation of that server and typically will include a computer-readablestorage medium (e.g., a hard disk, random access memory, read onlymemory, etc.) storing instructions that, when executed by a processor ofthe server, allow the server to perform its intended functions. Suitableimplementations for the operating system and general functionality ofthe servers are known or commercially available and are readilyimplemented by persons having ordinary skill in the art, particularly inlight of the disclosure herein.

The environment, in one embodiment, is a distributed and/or virtualcomputing environment utilizing several computer systems and componentsthat are interconnected via communication links, using one or morecomputer networks or direct connections. However, it will be appreciatedby those of ordinary skill in the art that such a system could operateequally well in a system having fewer or a greater number of componentsthan are illustrated in FIG. 9. Thus, the depiction of the system 900 inFIG. 9 should be taken as being illustrative in nature and not limitingto the scope of the disclosure.

The various embodiments further can be implemented in a wide variety ofoperating environments, which in some cases can include one or more usercomputers, computing devices or processing devices which can be used tooperate any of a number of applications. User or client devices caninclude any of a number of general purpose personal computers, such asdesktop, laptop or tablet computers running a standard operating system,as well as cellular, wireless and handheld devices running mobilesoftware and capable of supporting a number of networking and messagingprotocols. Such a system also can include a number of workstationsrunning any of a variety of commercially-available operating systems andother known applications for purposes such as development and databasemanagement. These devices also can include other electronic devices,such as dummy terminals, thin-clients, gaming systems and other devicescapable of communicating via a network. These devices also can includevirtual devices such as virtual machines, hypervisors and other virtualdevices capable of communicating via a network.

Various embodiments of the present disclosure utilize at least onenetwork that would be familiar to those skilled in the art forsupporting communications using any of a variety ofcommercially-available protocols, such as Transmission ControlProtocol/Internet Protocol (“TCP/IP”), User Datagram Protocol (“UDP”),protocols operating in various layers of the Open System Interconnection(“OSI”) model, File Transfer Protocol (“FTP”), Universal Plug and Play(“UpnP”), Network File System (“NFS”), Common Internet File System(“CIFS”) and AppleTalk. The network can be, for example, a local areanetwork, a wide-area network, a virtual private network, the Internet,an intranet, an extranet, a public switched telephone network, aninfrared network, a wireless network, a satellite network and anycombination thereof.

In embodiments utilizing a web server, the web server can run any of avariety of server or mid-tier applications, including Hypertext TransferProtocol (“HTTP”) servers, FTP servers, Common Gateway Interface (“CGP”)servers, data servers, Java servers, Apache servers and businessapplication servers. The server(s) also may be capable of executingprograms or scripts in response to requests from user devices, such asby executing one or more web applications that may be implemented as oneor more scripts or programs written in any programming language, such asJava®, C, C# or C++, or any scripting language, such as Ruby, PHP, Perl,Python or TCL, as well as combinations thereof. The server(s) may alsoinclude database servers, including without limitation thosecommercially available from Oracle®, Microsoft®, Sybase® and IBM® aswell as open-source servers such as MySQL, Postgres, SQLite, MongoDB,and any other server capable of storing, retrieving and accessingstructured or unstructured data. Database servers may includetable-based servers, document-based servers, unstructured servers,relational servers, non-relational servers or combinations of theseand/or other database servers.

The environment can include a variety of data stores and other memoryand storage media as discussed above. These can reside in a variety oflocations, such as on a storage medium local to (and/or resident in) oneor more of the computers or remote from any or all of the computersacross the network. In a particular set of embodiments, the informationmay reside in a storage-area network (“SAN”) familiar to those skilledin the art. Similarly, any necessary files for performing the functionsattributed to the computers, servers or other network devices may bestored locally and/or remotely, as appropriate. Where a system includescomputerized devices, each such device can include hardware elementsthat may be electrically coupled via a bus, the elements including, forexample, at least one central processing unit (“CPU” or “processor”), atleast one input device (e.g., a mouse, keyboard, controller, touchscreen or keypad) and at least one output device (e.g., a displaydevice, printer or speaker). Such a system may also include one or morestorage devices, such as disk drives, optical storage devices andsolid-state storage devices such as random access memory (“RAM”) orread-only memory (“ROM”), as well as removable media devices, memorycards, flash cards, etc.

Such devices also can include a computer-readable storage media reader,a communications device (e.g., a modem, a network card (wireless orwired), an infrared communication device, etc.) and working memory asdescribed above. The computer-readable storage media reader can beconnected with, or configured to receive, a computer-readable storagemedium, representing remote, local, fixed and/or removable storagedevices as well as storage media for temporarily and/or more permanentlycontaining, storing, transmitting and retrieving computer-readableinformation. The system and various devices also typically will includea number of software applications, modules, services or other elementslocated within at least one working memory device, including anoperating system and application programs, such as a client applicationor web browser. It should be appreciated that alternate embodiments mayhave numerous variations from that described above. For example,customized hardware might also be used and/or particular elements mightbe implemented in hardware, software (including portable software, suchas applets) or both. Further, connection to other computing devices suchas network input/output devices may be employed.

Storage media and computer readable media for containing code, orportions of code, can include any appropriate media known or used in theart, including storage media and communication media, such as, but notlimited to, volatile and non-volatile, removable and non-removable mediaimplemented in any method or technology for storage and/or transmissionof information such as computer readable instructions, data structures,program modules or other data, including RAM, ROM, Electrically ErasableProgrammable Read-Only Memory (“EEPROM”), flash memory or other memorytechnology, Compact Disc Read-Only Memory (“CD-ROM”), digital versatiledisk (DVD) or other optical storage, magnetic cassettes, magnetic tape,magnetic disk storage or other magnetic storage devices or any othermedium which can be used to store the desired information and which canbe accessed by the system device. Based on the disclosure and teachingsprovided herein, a person of ordinary skill in the art will appreciateother ways and/or methods to implement the various embodiments.

The specification and drawings are, accordingly, to be regarded in anillustrative rather than a restrictive sense. It will, however, beevident that various modifications and changes may be made thereuntowithout departing from the broader spirit and scope of the invention asset forth in the claims.

Other variations are within the spirit of the present disclosure. Thus,while the disclosed techniques are susceptible to various modificationsand alternative constructions, certain illustrated embodiments thereofare shown in the drawings and have been described above in detail. Itshould be understood, however, that there is no intention to limit theinvention to the specific form or forms disclosed, but on the contrary,the intention is to cover all modifications, alternative constructionsand equivalents falling within the spirit and scope of the invention, asdefined in the appended claims.

The use of the terms “a” and “an” and “the” and similar referents in thecontext of describing the disclosed embodiments (especially in thecontext of the following claims) are to be construed to cover both thesingular and the plural, unless otherwise indicated herein or clearlycontradicted by context. The terms “comprising,” “having,” “including”and “containing” are to be construed as open-ended terms (i.e., meaning“including, but not limited to,”) unless otherwise noted. The term“connected,” when unmodified and referring to physical connections, isto be construed as partly or wholly contained within, attached to orjoined together, even if there is something intervening. Recitation ofranges of values herein are merely intended to serve as a shorthandmethod of referring individually to each separate value falling withinthe range, unless otherwise indicated herein and each separate value isincorporated into the specification as if it were individually recitedherein. The use of the term “set” (e.g., “a set of items”) or “subset”unless otherwise noted or contradicted by context, is to be construed asa nonempty collection comprising one or more members. Further, unlessotherwise noted or contradicted by context, the term “subset” of acorresponding set does not necessarily denote a proper subset of thecorresponding set, but the subset and the corresponding set may beequal.

Conjunctive language, such as phrases of the form “at least one of A, B,and C,” or “at least one of A, B and C,” unless specifically statedotherwise or otherwise clearly contradicted by context, is otherwiseunderstood with the context as used in general to present that an item,term, etc., may be either A or B or C, or any nonempty subset of the setof A and B and C. For instance, in the illustrative example of a sethaving three members, the conjunctive phrases “at least one of A, B, andC” and “at least one of A, B and C” refer to any of the following sets:{A}, {B}, {C}, {A, B}, {A, C}, {B, C}, {A, B, C}. Thus, such conjunctivelanguage is not generally intended to imply that certain embodimentsrequire at least one of A, at least one of B and at least one of C eachto be present.

Operations of processes described herein can be performed in anysuitable order unless otherwise indicated herein or otherwise clearlycontradicted by context. Processes described herein (or variationsand/or combinations thereof) may be performed under the control of oneor more computer systems configured with executable instructions and maybe implemented as code (e.g., executable instructions, one or morecomputer programs or one or more applications) executing collectively onone or more processors, by hardware or combinations thereof. The codemay be stored on a computer-readable storage medium, for example, in theform of a computer program comprising a plurality of instructionsexecutable by one or more processors. The computer-readable storagemedium may be non-transitory.

The use of any and all examples, or exemplary language (e.g., “such as”)provided herein, is intended merely to better illuminate embodiments ofthe invention and does not pose a limitation on the scope of theinvention unless otherwise claimed. No language in the specificationshould be construed as indicating any non-claimed element as essentialto the practice of the invention.

Embodiments of this disclosure are described herein, including the bestmode known to the inventors for carrying out the invention. Variationsof those embodiments may become apparent to those of ordinary skill inthe art upon reading the foregoing description. The inventors expectskilled artisans to employ such variations as appropriate and theinventors intend for embodiments of the present disclosure to bepracticed otherwise than as specifically described herein. Accordingly,the scope of the present disclosure includes all modifications andequivalents of the subject matter recited in the claims appended heretoas permitted by applicable law. Moreover, any combination of theabove-described elements in all possible variations thereof isencompassed by the scope of the present disclosure unless otherwiseindicated herein or otherwise clearly contradicted by context.

All references, including publications, patent applications and patents,cited herein are hereby incorporated by reference to the same extent asif each reference were individually and specifically indicated to beincorporated by reference and were set forth in its entirety herein.

What is claimed is:
 1. A computer-implemented method for searching adata set, comprising: for a data set that is indexed by a plurality ofindices, receiving a request to create a search index for querying thedata set, the request specifying a query object and a basis for queryfor the data set, the query object being a data field to becomesearchable based at least in part on the basis for the query;constructing a table in a database, the table having the basis as aprimary key and the query object as a secondary index; and making thetable available to service queries.
 2. The computer-implemented methodof claim 1, further comprising: receiving a request to query the dataset; identifying the table based at least in part on the basis;searching the table to yield a record of the secondary index; andproviding the record of the secondary index.
 3. The computer-implementedmethod of claim 1, wherein the database is a non-transactional database.4. The computer-implemented method of claim 1, further comprisingstoring the data set.
 5. The computer-implemented method of claim 1,wherein the primary key comprises a hash key and a hash range.
 6. Thecomputer-implemented method of claim 1, further comprising: receiving anupdate to the data set; and updating the table according to the update.7. The computer-implemented method of claim 6, wherein: the methodfurther comprises transmitting a notification of the update; andupdating the table is triggered as a result of the notification.
 8. Asystem, comprising at least one computing device configured to implementone or more services, the one or more services: for a data set in aNoSQL database, receives a query pair that comprises a query object forthe data set and a basis, the query object being associated with anindex of the data set; creates a table for the data set, the table beingsearchable based at least in part on a set of records of the basis, thetable having a primary key comprising a hash key and a hash range; andmakes the table available to be searched to yield a yielded record thatbelongs to the query object of the query pair.
 9. The system of claim 8,wherein the one or more services: receives a request to query the dataset; identifies the table based at least in part on the basis; searchesthe table to yield the record; and provides the record.
 10. The systemof claim 9, wherein the query specifies the basis.
 11. The system ofclaim 8, wherein the primary key comprises a hash key and a hash range.12. The system of claim 8, wherein the one or more services: receives anupdate to the data set; and updates the table according to the update.13. The system of claim 8, wherein the one or more services: transmits anotification of an update to the data set; and triggers updating thetable as a result of the notification.
 14. The system of claim 13,wherein the one or more services comprises a queue service that receivesthe notification.
 15. A non-transitory computer-readable storage mediumhaving collectively stored thereon executable instructions that, as aresult of execution by one or more processors of a computer system,cause the computer system to at least: for a data set that is indexed bya plurality of indices, receive a request to create a search index forquerying the data set, the request specifying a query object and a basisfor query for the data set, the query object being a data field tobecome searchable based at least in part on the basis for the query;construct a table in a database, the table having the basis as a primarykey and the query object as a secondary index; and make the tableavailable to service queries.
 16. The non-transitory computer-readablestorage medium of claim 15, wherein the instructions further compriseinstructions that, as a result of execution by the one or moreprocessors, cause the computer system to further: receive a request toquery the data set; identify the table based at least in part on thebasis; search the table to yield a record of the secondary index; andprovide the record of the secondary index.
 17. The non-transitorycomputer-readable storage medium of claim 15, wherein the instructionsfurther comprise instructions that, as a result of execution by the oneor more processors, cause the computer system to further: provide aweb-based interface through which updates to the data set can beeffected.
 18. The non-transitory computer-readable storage medium ofclaim 15, wherein the primary key comprises a hash key and a hash range.19. The non-transitory computer-readable storage medium of claim 15,wherein the instructions further comprise instructions that, as a resultof execution by the one or more processors, cause the computer system tofurther transmit a notification of an update to the data set to causethe table to be updated.
 20. The non-transitory computer-readablestorage medium of claim 15, wherein the instructions further compriseinstructions that, as a result of execution by the one or moreprocessors, cause the computer system to: receive an update to the dataset; and update the table according to the update.